Increased Coverage Obtained by Combination of Methods for Protein Sequence Database Searching

نویسندگان

  • Caleb Webber
  • Geoffrey J. Barton
چکیده

MOTIVATION Sequence alignment methods that compare two sequences (pairwise methods) are important tools for the detection of biological sequence relationships. In genome annotation, multiple methods are often run and agreement between methods taken as confirmation. In this paper, we assess the advantages of combining search methods by comparing seven pairwise alignment methods, including three local dynamic programming algorithms (PRSS, SSEARCH and SCANPS), two global dynamic programming algorithms (GSRCH and AMPS) and two heuristic approximations (BLAST and FASTA), individually and by pairwise intersection and union of their result lists at equal p-value cut-offs. RESULTS When applied singly, the dynamic programming methods SCANPS and SSEARCH gave significantly better coverage (p=0.01) compared to AMPS, GSRCH, PRSS, BLAST and FASTA. Results ranked by BLAST p-values gave significantly better coverage compared to ranking by BLAST e-values. Of 56 combinations of eight methods considered, 19 gave significant increases in coverage at low error compared to the parent methods at an equal p-value cutoff. The union of results by BLAST (p-value) and FASTA at an equal p-value cutoff gave significantly better coverage than either method individually. The best overall performance was obtained from the intersection of the results from SSEARCH and the GSRCH62 global alignment method. At an error level of five false positives, this combination found 444 true positives, a significant 12.4% increase over SSEARCH applied alone.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Defining parameters for homology-tolerant database searching.

De novo interpretation of tandem mass spectrometry (MS/MS) spectra provides sequences for searching protein databases when limited sequence information is present in the database. Our objective was to define a strategy for this type of homology-tolerant database search. Homology searches, using MS-Homology software, were conducted with 20, 10, or 5 of the most abundant peptides from 9 proteins,...

متن کامل

Sequence clustering strategies improve remote homology recognitions while reducing search times.

Sequence databases are rapidly growing, thereby increasing the coverage of protein sequence space, but this coverage is uneven because most sequencing efforts have concentrated on a small number of organisms. The resulting granularity of sequence space creates many problems for profile-based sequence comparison programs. In this paper, we suggest several strategies that address these problems, ...

متن کامل

iProsite: an improved prosite database achieved by replacing ambiguous positions with more informative representations

PROSITE database contains a set of entries corresponding to protein families, which are used to identify the family of a protein from its sequence. Although patterns and profiles are developed to be very selective, each may have false positive or negative hits. Considering false positives as items that reduce the selectiveness of a pattern, then, the more selective pattern we have, a more accur...

متن کامل

Determination of primary structure and microheterogeneity of a beta-amyloid plaque-specific antibody using high-performance LC-tandem mass spectrometry.

Using the bottom-up approach and liquid chromatography (LC) in combination with mass spectrometry, the primary structure and sequence microheterogeneity of a plaque-specific anti-beta-amyloid (1-17) monoclonal antibody (clone 6E10) was characterized. This study describes the extent of structural information directly attainable by a high-performance LC-tandem mass spectrometric method in combina...

متن کامل

Prediction of Saccharomyces cerevisiae protein functional class from functional domain composition

MOTIVATION A key goal of genomics is to assign function to genes, especially for orphan sequences. RESULTS We compared the clustered functional domains in the SBASE database to each protein sequence using BLASTP. This representation for a protein is a vector, where each of the non-zero entries in the vector indicates a significant match between the sequence of interest and the SBASE domain. T...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 19 11  شماره 

صفحات  -

تاریخ انتشار 2003